摘要 :
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relationa...
展开
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset "just in time". We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
收起
摘要 :
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relationa...
展开
Data processing is increasingly the subject of various internal and external regulations, such as GDPR which has recently come into effect. Instead of assuming that such processes avail of data sources (such as files and relational databases), we approach the problem in a more abstract manner and view these processes as taking datasets as input. These datasets are then created by pulling data from various data sources. Taking a W3C Recommendation for prescribing the structure of and for describing datasets, we investigate an extension of that vocabulary for the generation of executable R2RML mappings. This results in a top-down approach where one prescribes the dataset to be used by a data process and where to find the data, and where that prescription is subsequently used to retrieve the data for the creation of the dataset "just in time". We argue that this approach to the generation of an R2RML mapping from a dataset description is the first step towards policy-aware mappings, where the generation takes into account regulations to generate mappings that are compliant. In this paper, we describe how one can obtain an R2RML mapping from a data structure definition in a declarative manner using SPARQL CONSTRUCT queries, and demonstrate it using a running example. Some of the more technical aspects are also described.
收起
摘要 :
Evaluating bias, fairness, and social impact in monolingual language models is a difficult task. This challenge is further compounded when language modeling occurs in a multilingual context. Considering the implication of evaluati...
展开
Evaluating bias, fairness, and social impact in monolingual language models is a difficult task. This challenge is further compounded when language modeling occurs in a multilingual context. Considering the implication of evaluation biases for large multilingual language models, we situate the discussion of bias evaluation within a wider context of social scientific research with computational work. We highlight three dimensions of developing multilingual bias evaluation frameworks: (1) increasing transparency through documentation, (2) expanding targets of bias beyond gender, and (3) addressing cultural differences that exist between languages. We further discuss the power dynamics and consequences of training large language models and recommend that researchers remain cognizant of the ramifications of developing such technologies.
收起
摘要 :
An organisation using personal data should document its data governance processes to maintain and demonstrate compliance with the General Data Protection Regulation (GDPR). As processes evolve, their documentation should reflect t...
展开
An organisation using personal data should document its data governance processes to maintain and demonstrate compliance with the General Data Protection Regulation (GDPR). As processes evolve, their documentation should reflect these changes with an assessment showing ongoing compliance. Through this paper, we show how semantic representations of processes are useful towards maintaining ongoing GDPR compliance by using a test-driven approach that generates and checks constraints for adherence to GDPR requirements. We first check whether all required information has been documented, and then whether it is compliant. We prototype our testing approach using a real-world website's consent mechanism for GDPR compliance, and persist results towards generating documentation. We use previously-published ontologies to represent processes (GDPRov), consent (GConsent), and GDPR (GDPRtEXT), with SHACL used to test requirement constraints. Paper and Resources: https://w3id.org/GDPRep/semantic-tests.
收起
摘要 :
The General Data Protection Regulation (GDPR) is the new European data protection law whose compliance affects organisations in several aspects related to the use of consent and personal data. With emerging research and innovation...
展开
The General Data Protection Regulation (GDPR) is the new European data protection law whose compliance affects organisations in several aspects related to the use of consent and personal data. With emerging research and innovation in data management solutions claiming assistance with various provisions of the GDPR, the task of comparing the degree and scope of such solutions is a challenge without a way to consolidate them. With GDPR as a linked data resource, it is possible to link together information and approaches addressing specific articles and thereby compare them. Organisations can take advantage of this by linking queries and results directly to the relevant text, thereby making it possible to record and measure their solutions for compliance towards specific obligations. GDPR text extensions (GDPRtEXT) uses the European Legislation Identifier (ELI) ontology published by the European Publications Office for exposing the GDPR as linked data. The dataset is published using DCAT and includes an online webpage with HTML id attributes for each article and its subpoxnts. A SKOS vocabulary is provided that links concepts with the relevant text in GDPR. To demonstrate how related legislations can be linked to highlight changes between them for reusing existing approaches, we provide a mapping from Data Protection Directive (DPD), which was the previous data protection law, to GDPR showing the nature of changes between the two legislations. We also discuss in brief the existing corpora of research that can benefit from the adoption of this resource.
收起
摘要 :
Information and communications technology (ICT) encompasses all technologies that facilitate the processing, transfer and exchange of information and communication services, including the technology used to store, manipulate, dist...
展开
Information and communications technology (ICT) encompasses all technologies that facilitate the processing, transfer and exchange of information and communication services, including the technology used to store, manipulate, distribute and create information. Marzelle (quoted in UNDP) suggests that ICTs are both traditional (such as radio, television, dance, drama folklore, print and fax) and new devices (such as the Internet, the World Wide Web, electronic mail, teleconferencing and distance learning tools including CD-ROMs, hypertext and the virtual classroom). ICT has continued to improve the way we live, work, interact with our environment and perceive our lives. In addition, the proliferation of ICTs has played a key role in changing the lifestyle of many people, including older adults in recent times. Further, it has shown~1 to have considerable potential to boost economic growth and promote international development. ICTs~2 are the main drivers of economic growth in African countries over the recent period 2007-2016. It is also suggested that technology plays an important role in driving the development of the information society and economy in developing countries, with many countries in Africa equally placed to take advantage of technology to facilitate socioeconomic development. Technology is also widely recognized as having the potential to improve environmental performance and tackle climate change. Moreover, it is claimed that ICT provides the bedrock for survival and development in a rapidly changing global environment. However, climate change and global warming represent a complex set of challenges and long-term problems that require collaborative solutions and the engagement of all countries on the planet. It is for this reason that a cohesive and coordinated international ICT policy between countries is required to respond to this emerging global reality and avert this continuous environmental degradation.
收起
摘要 :
With the increasing scale of online cultural heritage collections, the efforts of manually adding annotations to their contents become a challenging and costly endeavour. Entity Linking is a process used to automatically apply suc...
展开
With the increasing scale of online cultural heritage collections, the efforts of manually adding annotations to their contents become a challenging and costly endeavour. Entity Linking is a process used to automatically apply such annotations to a text based collection, where the quality and coverage of the linking process is highly dependent on the knowledge base that informs it. In this paper, we present our ongoing efforts to annotate a corpus of 17th century Irish witness statements using Entity Linking methods that utilise Semantic Web techniques. We discuss problems faced in this process and attempts to remedy them.
收起
摘要 :
Gender bias is a frequent occurrence in NLP-based applications, especially pronounced in gender-inflected languages. Bias can appear through associations of certain adjectives and animate nouns with the natural gender of referents...
展开
Gender bias is a frequent occurrence in NLP-based applications, especially pronounced in gender-inflected languages. Bias can appear through associations of certain adjectives and animate nouns with the natural gender of referents, but also due to unbalanced grammatical gender frequencies of inflected words. This type of bias becomes more evident in generating conversational utterances where gender is not specified within the sentence, because most current NLP applications still work on a sentence-level context. As a step towards more inclusive NLP, this paper proposes an automatic and generalisable re-writing approach for short conversational sentences. The rewriting method can be applied to sentences that, without extra-sentential context, have multiple equivalent alternatives in terms of gender. The method can be applied both for creating gender balanced outputs as well as for creating gender balanced training data. The proposed approach is based on a neural machine translation (NMT) system trained to 'translate' from one gender alternative to another. Both the automatic and manual analysis of the approach show promising results for automatic generation of gender alternatives for conversational sentences in Spanish.
收起
摘要 :
Gender bias is a frequent occurrence in NLP-based applications, especially pronounced in gender-inflected languages. Bias can appear through associations of certain adjectives and animate nouns with the natural gender of referents...
展开
Gender bias is a frequent occurrence in NLP-based applications, especially pronounced in gender-inflected languages. Bias can appear through associations of certain adjectives and animate nouns with the natural gender of referents, but also due to unbalanced grammatical gender frequencies of inflected words. This type of bias becomes more evident in generating conversational utterances where gender is not specified within the sentence, because most current NLP applications still work on a sentence-level context. As a step towards more inclusive NLP, this paper proposes an automatic and generalisable re-writing approach for short conversational sentences. The rewriting method can be applied to sentences that, without extra-sentential context, have multiple equivalent alternatives in terms of gender. The method can be applied both for creating gender balanced outputs as well as for creating gender balanced training data. The proposed approach is based on a neural machine translation (NMT) system trained to 'translate' from one gender alternative to another. Both the automatic and manual analysis of the approach show promising results for automatic generation of gender alternatives for conversational sentences in Spanish.
收起
摘要 :
By interlinking internal Linked Data (LD) entities to related LD entities published by authoritative creators and holders of data, libraries have the potential to expose their collections to a larger audience and to allow for rich...
展开
By interlinking internal Linked Data (LD) entities to related LD entities published by authoritative creators and holders of data, libraries have the potential to expose their collections to a larger audience and to allow for richer user searches. While increasing numbers libraries are devoting time to publishing LD, the full potential of these datasets has not been explored due to limited LD interlinking. In 2018 we conducted a survey which explored the position of Information Professionals (IPs), such as librarians, archivists and cataloguers, with regards to LD. Results indicated that IPs find the process of data interlinking to be a particularly challenging step in the creation of Five Star LD. Consequently, we developed NAISC, an interlinking approach designed specifically for the library domain aimed at facilitating increased IP engagement in the LD interlinking process. Our paper provides an overview of the design and user-evaluation of NAISC. Results indicated that IPs found NAISC easy-to-use and useful for creating LD interlinks.
收起